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Abstract 

This is a manual (built by examples) to explain the use of MDP_QCD [|. It 
consists of an ensemble of classes and functions (written in GNU C++) to 
help in writing programs for lattice QCD in a particularly Object Oriented 
fashion. Some tricks are implemented, hidden in the class definition, to 
optimize speed and reduce memory usage on PCs, workstations and parallel 
computers with sheared memory. 



^ The code described can be dowloaded from the web page: 

www . hep . phys . soton . ac . uk/hepwww/postgrad/M . DiPierro/sof tware . html 

It can be freely copied and used as long the name of the author is retained. 
The author declines any responsibility for any improper or unauthorized use of this software. 
If you find any bug in the code, or you have any suggestion, please report it to the author. 
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1 Introduction 



This paper describes MDP_QCD, a collection of classes and functions written in 
GNU0 C++ to provide a framework to develop fast and efficient code for 

lattice simulation of SU{Nc) gauge theories (with Wilson or Sheikoleslami- 
Wolhert [|] P fermions) . For references see @ 0] HI 
The main characteristics of MDP_QCD are: 

• Implementation of the standard mathematical language used by physicists. 

• It is based on Toy_class and Matrix (a class for matricial manipulations), 
both written by the author. The assignment operator and the copy con- 
structor have been overloaded in a new way to reduce the necessity of copy- 
ing when an object containing a pointer to dynamically allocated memory 
is returned by a function. 

• Four main classes: gauge_f ield, pl_f ield, em_f ield and f ernii_f ield, all 
based on the same optimization trick used in Toy_class. The basic binary 
operators have been overloaded for these classes. 

• An advanced random number generator, including a generator of random 
SU{Nc) matrices. 

• Standard simulation algorithms: a quenched mult ihit () Metropolis mon- 
tecarlo and a mul_invQ() to invert the fermionic matrix. 

• Implementation of stochastic all-to-all propagators for light quarks. 

• An optimized compressor for saving the gauge configurations. 

• Everything works for an arbitrary number of colors, Nc- 

The main idea on which the code has been built is the solution of the problem 
of returning objects containing pointers to dynamically allocated memory (real- 
ized in class Matrix, Toy_class and inherited by gauge_field, pl_field, 
eni_f ield and f ermi_f ield). 

Consider the following code: 

Matrix A,B(10, 10) ; 

A=B; 

A=B*A; 

^ It has been tested with GNU gpp compiler on Solaris, Linux and Windows NT. Modifica- 
tions may be needed for a different compiler. 
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In the first assignment one wants each element of B to be copied in the corre- 
sponding element of A. In the second assignment it is faster if B and A are passed 
to the local variables of the operator* by reference (i.e. without copying all the 
elements). Moreover one wants the local variable created by the operator* to 
occupy the same memory location as the variable that is returned (this avoids 
copying and wasting space). To implement this idea each Matrix object contains 
a FLAG and a pointer to dynamically allocated memory (where the numbers are 
stored). The copy constructor and the operator= have been overloaded in such 
a way to take care of the status of the FLAG and eventually to copy the pointer 
to the memory, instead of copying the memory containing the real matrix. 

A physical location of memory may be pointed by different Matrix objects, 
but this never generates confusion if the safety rules (stated in appendix D) 
are followed. An automatic system of garbage collecting deallocates the unused 
memory when there are no objects alive pointing to it. 

In this way, in the first assignment of the example, 11 + 800 bytesQ are copied, 
while in the second assignment only 11 bytes are copied three times (when passing 
A and B to the operator* () and when returning the result) to be compared 
with the 800 bytes of the matrix. The pointer of A is swapped only when the 
multiplication is terminated without generating confusion between the input and 
the output of the function. This is faster than it would be possible in FORTRAN. 
In FORTRAN it would be necessary to create a temporary array where to store 
the result of the multiplication and then copying it into A. 

The FLAG takes care of everything and the procedure works in every possible 
recursive expression. 

One more optimization is contained in the following code: 

gauge_f ield U; 
f ermi_f ield psi; 
psi(x)=U(x,mu)*psi(x) ; 

In this example U(x,mu) and psi(x) are objects of class Matrix but each 
of them, instead of containing a pointer to new dynamically allocated memory, 
contains pointer to the same location of memory allocated by the constructor of U 
or psi. In other words: every time the programmer introduces a new object, the 
compiler knows if it is necessary to allocate new memory for it or just superimpose 
it to an existing allocated portion of memory. 

All the details about these tricks are hidden from the high level programmer, 
who does not even need to know what pointers are. 



•^The number 11 is the size in bytes of a Matrix object. In is independent from the size of 
the memory occupied by the "real" matrix. 
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2 MDP_Libl.h and its usage 



Five classes are declared in the file MDP_Libl .h 0: 

• Complex. Declared as complex<f loat>. The imaginary unit is imple- 
mented global constant I=Complex(0, 1). 

• Toy_class. It has no explicit application. It contains all the tricks to 
minimize the memory usage and maximize speed. Its members are inherited 
by the fields defined in MDP_QCD.h. The programmer does not need to 
know about the structure of Toy_class and for this reason its description 
is postponed to appendix C and D. 

• Matrix. An object belonging to this class is effectively a complex matrix 
and it may have arbitrary dimensions. It is based on the same optimization 
techinique described in appendix C and D. 

• Randoni_generator. This class contains public member functions to gen- 
erate random long, float, double numbers and random SU{n) matrices. 
Random is a global object belonging to this class and it can be used to gain 
access to any member function. 

• JackBoot. It enables to compute Jackkinfe and Boostrap errors in a par- 
ticularly easy and general Object Oriented way. 

2.1 class Matrix 

A matrix object, say M, can be declared in two different ways 

Matrix M(r,c); // r rows times c columns 

Matrix M; //a general matrix 

Even if the size of a matrix has been declared it can be changed anywhere with 
the command 

M. dimension (r, c) ; // r rows times c columns 

Any matrix is automatically resized, if necessary, when a value is assigned to it. 
The following lines are correct and print 8 , 8 

Matrix M(5,7) , A(8,8) ; 
M=A; 

printf ("y.i,y.i\n" , M.rowmaxO ,M.colmax()) ; 

''The classes defined in the file MDP_Libl . C have already been used in some analysis programs 
written by the Southampton Theory Group 
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The memeber functions rowmaxO and colmaxO return respectively the number 
of rows and columns of a Matrix. 

The element of a matrix M can be accessed with the natural syntax 

M(i,j) 

Moreover the class contains functions to perform standard operations: 

+, -, *, /, +=, -=, *=, /=, inv, det , exp, sin, cos, log, 
transpose, hermitiajn, minor, identity,... 

As an example a program to compute 
is the following 

// Program saved in FILE: pOl.C 
#include "MDP_Libl.h" 
int mainO { 

Matrix a(2,2) ; 

a(0,0)=2; a(0,l)=3; 

a(l,0)=4; a(l,l)=5*I; 

print(inv(exp(a) ) ) ; 

return 0; 



It is straightforward to add new functions copying the prototype 

Matrix f (Matrix a, Matrix b, ...) { 
Matrix M; 
// body, 
prepare (M) ; 
return M; 

>; 

One more example of how to use the class Matrix follows 

// Program saved in FILE: p02.C 
#include "MDP_Libl .h" 
Matrix cube (Matrix X) { 

Matrix Y; 

Y=X*X*X; 

prepare (Y) ; 

return Y; 



^ A list of general safety rules to be applied when using the class Matrix can be found in 
Appendix D. 
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Example 


C++ with MDP_Libl.h 


A G M V (C) 


A dimens ion Cr 


A- 

'J 


A(i i) 


A = B + C ~ D 


A=B+C-D 




A=B*C 


^(ij) ^ ]^(jk)(j(ik) 


A=mul_left(B,C) 


A = aB + C 


A=a*B+C 


A = al + B -bl 


A=a+B-b 


A = B^C-^ 


A=transpose(B) *inv(C) 


A = B^exp{iC) 


A=hermitian(B)*exp(I*C) 


A = cos(5) + i sm{B) * C 


A=cos(B)+I*sin(B)*C 


a = Tea\{tr{B-^C)) 


a=real (trace (inv(B) *C) ) 


a = det{B) * det{B'^) 


a=det (B) *det ( inv (B) ) 



Table 1: Examples of typical instructions acting on Matrix objects. A,B,C,D are 
assumed to be declared as Matrix; r,c as int; a,b may be any kind of number. 



int mainO { 
Matrix A,B; 
A=Random.SU(3) ; 
B=cube (A) *exp (A) +inv (A) ; 
print (A) ; 
print (B) ; 
return ; 

>; 

This code prints on the screen a random SU(3) matrix A and B = A^e^ + A. 
Some example statements are listed in tab. |l]. Note the command 

A=mul_left(B,C) ; 

that is equivalent but faster than 

A=C*transpose (B) ; 

It will be useful to multiply a fermionic field (seen as a set of color x spin matries) 
by a spin structure. 

2.2 class Random_generator 

It is possible to declare different objects belonging to this class but this should be 
done only if one needs many independent random generators (in the sense that 
they relay on different seeds). For all the applications considered from now on, 
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only one random generator will be needed. To this scope a global object called 
Random is automatically declared when MDP_Libl.h is included. Through this 
global object it is possible to access any member function of the class. 

• void Random. read_seed( char filename []); It reads the seed from the 
file filename and initializes the random generator. If no filename is pro- 
vided the default filename is RandomBuf f er . seed. If the program does not 
find the file with the seed it automatically creates it with some default 
values in it. 

• void Random. write_seed(char filename []); It writes the last computed 
seed into the file filename. The default file name is RandomBuf fer . seed. 

• long Random. Long (long n) ; It returns a random long number between 
and n-1. 

• float Random . Float () ; It returns a random float number between and 
1. 

• double Random. Double ; It returns a random double number between 
and 1. 

• float Random. Gauss ; It returns a random float number x generated 
with a Gaussian probability P{x) = exp(— 2:^/2). 

• float Random. Floatp (float (*P) (float, void*), void* a) ; It returns 
a random number x generated with a Gaussian probability P (x , a) , where 
a is any set of parameters pointed by a (the second argument passed to 
Floatp). 

• Matrix Random . SU ( int n) ; it returns a random Matrix in the group 
SU{n) or f/(l) if the argument is n = 1. 

The algorithm for SU(2) is based on the observation that each rotation in the 
3D space (in the group 0(3)) can be represented by a direction a E S"^ and a 
rotation angle around that direction, a G [0, vr). Therefore first a random element 
of 0(3) is generated, second, it is mapped to SU{2). The map between 0(3) and 
SU (2) is realized by 

{a, a} — i> exp{iaa ■ cr) = cos(a) + ia ■ a sin(a;) (2) 

where (cr^, o"^, a^) is a vector of Pauli matrices. 



The algorithm for SU{n > 2) is the Cabibbo-Marinari iteration based on 
SU{2). 
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2.3 class JackBoot 



Suppose one has n gauge configurations and, on each configuration one has mea- 
sured m float quantities a[0] , a[l] ,...,a[m-l] . Then one wants to compute the 
average, over the gauge configurations, of a function F (float *a) with its Jack- 
knife and Bootstrap errors |14]. A simple way to do it is to define a JackBoot 
object, let's call it jb. It is a sort of container for the data. After it has been 
filled it can be asked to return the mean of F() and its errors. Here is an example 
of how it works: 



// object declaration: 
JackBoot<f loat , m, n> jb; 
// assigning the function: 

jb.f=F; // note that f is a member variable 

// while F is the user defined function 
// filling the object: 
for(jb.elem=0; jb.elem<n; jb.elem++) { 

// somehow measure a[0] , . . . a[m-l] on the gauge configs 

jb(0)=a[0] ; 

jb(l)=a[l] ; 

jb(m-l)=a[m-l] ; 

>; 

// printing the results 
printf ("Result 
printf (" Jackknif e error 
printf ("Bootstrap error 



= 7,f\n", jb.meanO); 
= 7,f\n", jb.j_err()); 
= 7.f\n", jb.b_err(100)) ; 



Note that 

• The class JackBoot<. . .> jb is defined with a template where the first 
argument between angular brakets is the basic variable type (in the example 
it is float), the second is the size of the array to be passed to f (), the 
third is the number of confugurations to be processed. 

• jb.f is the pointer to the function used in the analysis. 

• jb . elem is an integer that must be used as a counter on the configurations. 

• jb . operator (int n) is used to access data in the object (a configuration 
number in jb.elem must be specified). 

• jb.meanO returns the mean. 

• jb. j_err() returns the Jackknife error. 

• jb.b_err() returns the Bootstrap error. It takes as argument the number 
of Bootstrap samples. The default value is 200. 
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It is important to stress that the class JackBoot can contain objects of any 
class T, but the functions b_err is not properly defined unless the operator 
T: : operator> (class T) is declared. In other words one cannot perform a Boos- 
trap of functions of matrices unless one defines some kind of ordering relation 
between matrices. I suggest to play safe and perform only Boostraps of float 
numbers: JackBoot<f loat ,n,m> 

It is possible to declare arrays of JackBoot objects, but it is rarely necessary. 
It is simpler to declare different functions and repeat the analysis using the same 
JackBoot object assigning the pointer JackBoot : : f to each of the functions at 
the time. 

As another example consider the following program. It generates an array of 100 
SU{6) matrices. For each matrix it computes trace and determinant, and returns 
the average of the ratio between the real part of the trace and the real part of 
the determinant (with its Jackknike and Bootstrap errors). 

// Prograin saved in FILE: p03.C 
#include "MDP_Libl .h" 
#define ng 100 

float f (float *x) { return x[0]/x[l]; }; 
int mainO ■[ 
Matrix A; 

JackBoot<f loat , 2 ,ng> jb; 
for(jb.elein=0; jb.elein<ng; jb.elein++) { 

A=Randoin.SU(6) ; 

jb(0)=real(trace(A)) ; 

jb(l)=real(det(A)) ; 

}; 

jb.f=f ; 

printf ("Result = 7.f \n" , jb.meanO); 

printf (" Jackknif e error = 7.f\n", jb.j_err()); 
printf ("Bootstrap error = °/.f\n", jb.b_err(100) ) ; 
return 0; 

}; 

The output is 

Result = 0.036311 

Jackknif e error = 0.087398 
Bootstrap error = 0.085979 

3 MDP_QCD.h and its usage 
3.1 Basic Syntax 

The most general C++ program using MDP_QCD must have the structure 
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#include "MDP_qCD.h" 

// declaration of functions 

int mainO { 
start ; 

// initialize beta=. . . 
// kappa= . . . 

// c_SW=... 
// main body 

stopO ; 
return ; 

>; 

The file MDP_QCD.h includes MDP_Libl.h and MDP_Settings .h. Any other 
header needed is included by MDP_Libl .h. 

The file MDP_Settings .h contains the declaration of those global constants 
which can be modified by the user: size of the lattice and the number of colors. 
A typical settings file is 

#define NxO 6 // temporal sites 

#define Nxl 4 // spatial sites in the x direction 

#define Nx2 4 // spatial sites in the y direction 

#define Nx3 4 // spatial sites in the z direction 

#define Nc 3 // number of colors 

#define Nfermi 10 // used by light_propagator 

Note that it is sufficient to change the value of Nc and recompile the code to 
change the gauge group. No other line has to be modified. 

The instructions start () ; and stopO ; are compulsory and do the job of 
reading/writing the seed and initializating all the global variables and matrices. 
Remember that if the file containing the seed does not exist it is automatically 
created. 

All the global matrices are listed in Appendix A together with the conventions 
used by MDP_QCD. Any other built-in global variable is listed in (0). 
The basic objects in which lattice QCD is usually formulated are 

• Matrices (class Matrix) 

• Links (class gauge_f ield) 

• Plaquettes (class pl_f ield) 

• Chromo-electro-magnetic tensor (class eni_f ield) 

• Fermionic field (class f ernii_f ield) 
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• Fermionic light propagator (class light -propagator) 



To explain the power of this language, here is an example of a program to 
create one gauge configuration and test if the hnks are in SU (iVg) (where Nc is 
declared in MDP_Settings .h) 

// Program saved in FILE: p04.C 
#include "MDP_QCD.h" 
int mainO { 
start ; 

int mu, counter=0; 
site x; 
Matrix A; 
gauge_field U; 
U=hot() ; 
beta=5.7; 
multihit(U) ; 
for(x=0; x<Nvol; x++) 

for(mu=0; iiiu<4; mu++) { 

A=U(x ,mu) *hermiticin(U(x,mu) )-l ; 

if (real (trace (A*hermitian(A) ) ) >0 . 0001) 
counter++; 

>; 

if (counter==0) printfC'All links are in SU(N)\n"); 
stopO ; 
return 0; 

}; 

It produces the following output: 



PROGRAM MDP_QCD. GENERAL INFORMATIONS: 



Created by Massimo Di Pierro (0 1998) 
Version n. = 8-9-1998 
Compiling date = Oct 16 1998 
Compiling time = 20:57:58 
Num. of colors = 3 

Lattice size = 6(t) x 4(x) x 4(y) x 4(z) = 384 



PROGRAM STARTING: 



Setting indices 
Defining the matrices 
Creating SU(N) generators 
Readind Random. seed 

Creating initial hot configuration. . . 
Multihit step n.O, beta=5. 700000. . . 
... Efficiency: 83.407. 
All links are in SU(N) 
Writing Random. seed 
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PROGRAM END. 



In the rest of this manual the follwing declarations will be assumed 



int i,j,k; \\ color indices 

int a,b,c; \\ spinor indices 

int mu,nu; \\ lattice directions 

site x,y,z; \\ lattice sites 

Matrix A,B,C; \\ generic matrices 

gauge_field U; 

pl_field P; 

ein_field G; 

fermi_field chi, psi, phi; 



light_propagator S; 

The basic unary and binary operators have been overloaded. The use and 
power of the basic classes can be better explained with a the few example lines 
of tab. d). 

To handle properly objects belonging to any of the classes gauge_f ield, 
pl_field, eni_field and fermi_field the safety rules described in Ap- 
pendix D must be followed. 

3.2 Creating gauge configurations 

Any standard lattice simulation begins with the creation of an ensemble of gauge 
configurations {f/*}. It is created through a Markov process^, i.e. each con- 
figuration W is generated from the preceding one, using any stochastic 
algorithm 

U' = F{U'^^) (3) 

satisfying the relation 

eS\u^-'\p{U'~^ ^ U') = e-^^'^'^PiU' ^ U'-^) (4) 

where P{U —>■ U') is the probability of generating the configurations U' from the 
configuration U . Here S\U] is the lattice Euclidean action evaluated on the gauge 
configuration U . In MDP_QCD the standard quenched action is implemented 

'^[^] = E |- E [ (1 - U,{x)U,{x + ^^)Ul{x + v)Ul{x))] (5) 

where (3 = l/g'^{a) is the parameter that fixes the lattice scale. The action in 
eq. is a discretized version of the pure Yang-Mills action, i.e. fermion loops 
are neglected. 
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Example 


C++ with MDP_QCD 


returned type 


y = X + n 


y=up[x] [mu] 


site 


y = x-n 


y=dw[x] [nu] 


site 


y = X + 10/i 


y=move(x,mu, 10) 


site 


y = X + S^L + 5u 


y=move (x , mu , 3 , nu , 5) 


site 


a = \x — y\ 


a=dist(x,y) 


float 


X = {xo,Xi,X2,X3) 


x=position(xO,xl ,x2,x3) 


site 


U,{x) 


U(x,niu) 


Matrix 


Ul^ix) 
ip{x) 


U(x,mu,i, j) 


Complexfe 


psi(x) 


Matrix 




psi(x,i, alpha) 


Complexfe 


U;^{x) = Ul\y)U'/{z) 


U(x,mu)=U(y,nu)*U(z,ro) 


void 


U^{x)U^{x + u)U^{x + li) 


staple_up(U,mu,nu) 


Matrix 


U-y{x)Ufj,{X - v)Uy{X + /i - Z/) 


staple_dw (U , mu , nu) 


Matrix 




plaquette (U , x , mu , nu) 


Matrix 


ct>{x) = U,{y)i,{z) 


phi(x)=U(y,mu)*psi(z) 


void 


(l + 7^)^/'(z) 


mul_left(l+Gamma[mu] ,psi(z)) 


Matrix 




psi*phi 


Complex 


= Q[u]x 


psi=mul_Q (U,G, chi) 


void 




psi=mul_invQ(U,G, chi) 


void 




psi=mul_q (U , G , chi , , DAGGER) 


void 


^ = (Qt[f/])-V 


psi=mul_invQ(U,G,chi, , DAGGER) 


void 



Table 2: Example lines in C++ with MDP_QCD. Note that Q[..] is the fermionic 
matrix derived from the Sheikoleslami-Wolhert action. Its explicit expression is 
given in eq. (0). 
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The initial configuration f/° can be chosen to be "cold", i.e. when all its links 
are the identity, or "hot" , when each link is a random SU (N) matrix. 

Since it is very important to reduce the correlation between ?7*~^ and W, it 
is necessary to iterate eq.(|^) many times 

W = F{F{...F{U'-^)...)) (6) 

Any quantum gauge observable 0[U] can be measured as an an average over 
the ensemble of gauge configurations 

{0[U]) = J [dU]0[U]P[U] ^ -^J20[U'] (7) 

In MDP_QCD a gauge configuration U is an object of the class gauge_f ield. It 
is declared with the command 

gauge_field U; 

An initial hot configuration is created with the command 

U=hot() ; 

and an initial cold configuration is created with the command 

U=cold() ; 

The operation described in eq.(0) and eq.(^) is performed using the multi- 
hit algorithm that is described in the section on algorithms. The command to 
perfrom it is 

// beta=. . . 

multihit (U,num_hits ,nmn_iter) ; 

num_hits is the number of hits for each multihit step. num_iter is the number of 
multihit iterations. A value for the gloabal variable beta must be specified. The 
new configuration is stored in the same object U which contains the configuration 
which is passed as argument. This saves memory. In principle it is possible to 
define arrays of configurations and define many different gauge configurations at 
the same time 

gauge_field myconf [100] [3] ; 
gauge_field U, Ul, U2, U3; 

and copy one into the other as if they were ordinary variables 

myconf [37] [2]=U; 
U2=U; 

U=myconf [50] [0] ; 
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To avoid ambiguities the assignment operator of any kind of field returns void. 

Links Ufj,{x) can be accessed as matrices U(x,mu) or as complex numbers 
Wj{x), U(x,mu,i,j). It is also possible to access a link as a matrix and ex- 
tract its element U(x,mu)(i,j) but this is slower than the direct access 
U(x,mu,i, j). 

Here is a program that generates and saves an ensemble of 100 gauge con- 
figurations, at P — 6, starting from hot, performing 20 multi-hit steps at the 
time 

// Program saved in FILE: p05.C 
#include "MDP_QCD.h" 
int mainO { 
start () ; 

int g; 

gauge_field U; 
U=hot() ; 

beta=6.0; // beta is declared in MDP_QCD.h 
for(g=0; g<100; g++) { 
inultihit(U,5,20) ; 

write(U, "gauge_conf ig_beta_6 . 0" , g) ; 

>; 

stopO ; 
return 0; 

}; 

3.3 First measurements 

It only makes sense to measure gauge invariant quantities, i.e. closed loops of 
links. The minimum closed loop is the plaquette (pi) 

P^^Ax) = Ui,(x)U^(x + i2)U^f,(x + At + d)U^^{x + 9) (8) 

which is a rank two tensor field. It is related to the chromo-electro-magnetic (em) 
tensor field through the relation 

= ^[Pf,,{x) + Pf,,{x-il) + P^,{x-i}) + Pf,,{x-il-9)] (9) 

- l[Pu,^ix) + P.i,{x-i2) + P,^{x-u) + P,^{x-i2-u)] (10) 
o 

In MDP_QCD there is a class for the plaquettes field 
pl_field P; 

and a class for the chromo-electro-magnetic field 
em_field G; 
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Figure 1: The graph shows the real part of the trace of the average plaquette 
computed on 100 gauge configurations. The plateau indicates the termalized 
configurations. 

For an arbitrary gauge configuration U these field are computed with the com- 
mands 

P=compute_plaquettes (U) ; 
G=compute_em_teiisor (P) ; 

An interesting measurement to be performed on each gauge configuration is the 
real part of the trace of the average plaquette: 

float av_pl; 
av_pl=average (?) ; 

What follows is a program that reads the gauge configurations created by 
p05 . C, computes the average plaquette and prints the result for each gauge con- 
figuration. 

// Program saved in FILE: p06.C 
#include "MDP_QCD.h" 
int mainO { 

start ; 

int g; 

gauge_field U; 
for(g=0; g<100; g++) { 

readCU, "/home/lattice/mdp/gauge_conf ig_beta_6.0" , g) ; 

printfC'/oi yof\n",g, average (compute_plaquettes (U) )) ; 

>; 
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stopO ; 
return ; 

}; 

The output of the program has been plotted and is shown in fig. (|I]). 

It shows that the average plaquette approaches an asymptotic value (which 
mainly depends on the value of /?) and then fiuctuates around it. This is the region 
where the field is said to be thermalized, i.e. when the statistical fiuctuations 
compensate the classical tendency to minimise the free energy. This is the region 
that corresponds to a physical quantum vacuum. Configurations that are not 
thermalized should be discarded. As is evident from the figure, in our case the 
first 60 saved configurations are not termalized. We will refer to the remaining 
40 as "effective configurations". 

A pl_f ield P can be accessed as x matrix P(x,mu,nu) or as complex 
number P(x,mu,nu,i, j). Analogously an em_field G can be accessed as an 
Nc X Nc matrix G(x,mu,nu) or as complex number G(x,mu,nu,i, j). In both 
cases mu must be less than nu. The other components can be obtained 
by antisymmetrizing. 

For a general review on the typical quantities that can be computed on lattice 
see ref. [|T5[. 

As an example the next program computes a Polyakov loop on our effective 
40 gauge configurations and prints out the result for each gauge configuration. 

// ProgrEun saved in FILE: p07.C 
#include "MDP_qCD.h" 
int mainO { 

start ; 

int g,xO; 

site x; 

gauge_field U; 

Matrix Polyakov(Nc ,Nc) ; 

for(g=60; g<100; g++) { 

read(U, "gauge_conf ig_beta_6 . 0" , g) ; 

Polyakov=l ; 

x=0; 

for(xO=0; xO<NxO; xO++) { 
Polyakov=Polyakov*U(x, 0) ; 
x=up [x] [0] ; 

>; 

print (Polyakov) ; 

>; 

stopO ; 
return ; 

}; 
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3.4 Introducing fermions 

Another set of observables that can be measured on the lattice are the two and 
three point correlation functions between currents (and their Fourier transform) 



C2{Q = j rf='x(0|J(0)jt(x)|0) (11) 

C?{t.,ty) = J d3x(iV(0|J(-y)Q(0)Jt(a;)|0) (12) 

Since the lattice metric is Euclidean, the asymptotic behaviour of the spatial 
Fourier transform of the two point correlation function is given by 

C^iQ ~ Z^e-"'^'^ (13) 

where rrij is the mass of the lightest state |lj) created by the current and 

Z= 1(1,1 Jt(0)|0)| (14) 



Therefore from the measurement of C2{tx) and its fit to (p!3D, it is possible to 
extract masses of particles. In the same fashion from the asymptotic behaviour 
of the ratio between the three and two two point correlation functions it is possible 
to extract matrix elements 0] 

C^it,,ty) _ 1 {lj\Q\lj) ^^^^ 



C2{tx)C2{ty) 2mj 

The most general current J{x) can be expressed in terms of fundamental 
fermionic fields q^ix) (the quark fields). In expressions like eq.(0) and (|T^) 
these field are Wick contracted 

{om^).qUym = su^,y) (16) 

Despite the fact that fermions are neglected when gauge configurations are cre- 
ated, they are reintroduced at a later stage as particles propagating in the gluonic 
background field. Therefore the two and three point correlation functions can be 
written as appropriate traces of propagators, S^^ij{x,y)[U], in the backgroud glu- 
onic field U. 

In MDP_QCD fermionic fields belong to the class f ermi_f ield. In the following 
line the fields ^^id are declared 

fermi_field chi , psi, phi; 

On each gauge configuration U, the fermion propagator S is computed by invert- 
ing the fermionic matrix 

S[U] = iQ[U])-' (17) 
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This is the most time expensive part of any lattice calculations. The fermionic 
matrix implemented comes from the standard Sheikoleslami-Wolhert action]^ p2| 



M=0..3 

- if^csw (yi,uG^y{x)5x^y (18) 

tJ.<U 

where k and Csw are input parameters, k is in one to one correspondence with 
the fermion mass 

m=^(--^\ (19) 

and Kcrit is a parameter depending on /3. The chiral limit corresponds to the limit 
K — ^ Kcrii, whcu the quark becames massless. In practice any inversion algorithm 
for eq. (p!7D converges slower and slower as the chiral limit is approached and 
this can never be reached, csw also depends on /? and its purpose is to cancel 
order a effects in the propagator (as well as in physical observables) . csw can 
be computed perturbatively (it is 1 at three level), can be extimated from the 
average plaquette (equivalent to resumming "tadpole" contributions) or can be 
obtained by a fine tuning procedure (called "non-perturbative" improvement). 

K and Csw are declared as global variables in MDP_QCD. The multiplication of 
a fermi_f ield by the fermionic matrix is performed by the function mul_Q() [] 

kappa=0 . 135 ; 
c_SW=0; 

G=compute_em_tensor(compute_plaquettes(U) ) ; 
psi=mul_Q(U, G, chi) ; 

Here the second line reads 

^ = QV\x (20) 

Note that the chromo-electro-magnetic field G^^y computed on the gauge config- 
uration U must be passed to the function mul_Q. This is done to speed up the 
computation of the so called "clover" term (the part of the action proportional 
to Csw)- 

The inverse operation is performed by the function 

chi=mul_invQ(U, G, psi, le-4) ; 



^ The Feynman rules derived for this action can be found in Note that in this paper 
the cr^jy matrices are different, for an i factor, from those implemented and listed in appendix 
A. 

^ Note that before calling mul_Q() it is necessary to assign a value to the global variables 
kappa and c_SW 
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The fourth argument is the required precision in the inversion. By default it 
is 10"^ 

Consider now the following program 

// Program saved in FILE: pOS.C 
#iiiclude "MDP_QCD.h" 
int mainO { 
start ; 

float precision=le-4; 
gauge_f ield U; 
em_field G; 

fermi_field chi , psi, phi; 
U=hot() ; 
beta=6 . ; 
multihit(U,l,20) ; 

G=compute_em_tensor (compute_plaquettes (U) ) ; 

kappa=0. 125; 
c_SW=1.77; 

chi=new_f ermi_conf ig(U,G) ; 

psi=mul_Q(U,G,chi) ; 
phi=mul_invQ(U,G, psi, precision) ; 
phi=phi-chi ; 

precision=abs (phi*phi) / (Nvol*Nc*Nc) ; 
printf ( "precision=7„f \n" .precision) ; 
stopO ; 
return 0; 

}; 

It creates a gauge configuration U, a random fermionic configuration x and com- 
putes 

^ = Q[u]x (21) 

<P = Q-^[U]^-x (22) 



(^)1>L(^) (23) 



It then checks whether 6 is consistent with the input precision. Note the command 

chi=new_f ermi_conf ig(U,G) ; 

it returns a random fermi_field x with probability 

P[x] = exp[-x^Q^[U]Q[U]x] (24) 

Again, a fermi_field psi can be accessed as an iVc x 4 matrix, psi(x) or 
as complex number psi (x , i , alpha) . 



A more difficult task is the computation of the propagator of eq. (|T6|) from 
all points to all points. 
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3.5 Stochastic light propagators 



The light propagator of eq. (|T6D can be computed, for each gauge configuration, 
as an average 



S%{x,y)[U] = {v^!^\x) {xfiy))*)^ (25) 

where for each k 

Xt'^l = Q[U]ip^'^ (26) 

The fermi_f ield configurations (y^''^' must be generated stochastically Q with 
probability 

P[ip]=e^v{-V^Q^[U]Q[U]v) (27) 

In MDP_QCD light propagators are objects of the class light _propagator. A 
typical light propagators, S", is declared as 

light_propagator S; 
Each light_propagator S contains two member arrays of fermi_f ield 

f ermi_f ield S .psi [Nf ermi] ; 
fermi_field S . chi [Nf ermi] ; 

These arrays can be created^ for a fixed gauge configuration, with the com- 
mand: 

S . stochastic_generation(U,G) ; 

where U is the gauge field and G if the electromagnetic tensor. Here is how it is 
implemented 

light_propagator : : stochastic_generation(gauge_f ield U, em_field G) { 
register site k; 

printf ( "Start stochastic generation: Nf ermi=7oi\n" , Nf ermi) ; 
for(k=0; k<Nfermi; k++) { 
printf ("Step 7.1: ",k); 
psi [k] =new_f ermi_conf ig(U,G,0 . 0001) ; 
chi [k] =mul_q(U,G,psi [k] ) ; 

>; 

printf ( "Stochastic generation terminated. \n" ) ; 

>; 



^ The technique for generating these configurations has been recently carried one step for- 
ward and combined with a "maximal variance reduction" method to reduce the statistical noise 
jl^. This method is not implementd in MDP_QCD. 

^Note that before generating or reading a light_propagator it is necessary to assign a value 
to the global variables kappa and c_SW which are used by mul_Q() . 
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This code is the time expensive, therefore it may be a good idea to save the 
fermionic configurations once they have been generated. 

In the following program a stochastic propagator is created and saved for the 
60th of our gauge configurations. 

// Program saved in FILE: p09.C 
#include "MDP_QCD.h" 
int mainO { 

start ; 

gauge_field U; 

em_field G; 

light_propagator S; 

readCU, "gauge_conf ig_beta_6 . 0" , 60); 
kappa=0. 1345; c_SW=1.77; 

G=compute_em_tensor (compute_plaquettes (U) ) ; 

S . stochastic_generation(U,G) ; 

write (S , "propagator . 1345 . 177 . " , 60) ; 

stopO ; 

return ; 

}; 

There are two basic ways to extract informations from a stochastic_propagator 

S. 

• S (x , i , y , j , k) returns the spin Matrix Ngpin x ^spm 

^L'^^(^) {xf\y)y (28) 

• S (x , i , y , j ) returns the same spin Matrix N^pin x A^^pm averaged on the 
spinorial configurations. 

{vm^)(xf{y)y)^ (29) 

The following program computes, for just one gauge configuration, the two 
point correlation function of eq. (0) when J{x) is the current Iri^q. Note that 
the propagator for the b is computed in the static approximation, i.e. as product 
of links times |(1 + 7°). The light propagator is read from a file. 

// Program saved in FILE: plO.C 
#include "MDP_QCD.h" 
int mainO { 

start ; 

gauge_field U; 

em_field G; 

light_propagator S; 

site x,y,z; 

int i, j ,L=2; 
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Matrix H(Nc,Nc) ; 
Matrix A; 
Complex result=0; 

read(U, "gauge_coiif ig_beta_6 . 0" , 60); 
G=compute_em_tensor (compute_plaquettes (U) ) ; 

kappa=0 . 1345 ; c_SW=1.77; // They must correspond to the S saved! 
read(S,U,G, "propagator. 1345. 177. " ,60) ; 
x=0 ; y=move (x , , L) ; 
H=l; 

A=Gamma5*0 . 5* ( 1+Gamma [0] ) *Gamma5 

for(z=x; z!=y; z=up [z] [0] ) H=H*hermitian(U(z,0)) ; 
for(i=0; i<Nc; i++) 
for(j=0; j<Nc; j++) 

result=result+trace(S(y,i,x, j)*A)*H(j ,i) ; 
printf ( "result=7of + "/oflXn", real(result) , imag(result) ) ; 
stopO ; 
return 0; 

}; 

Some care is necessary when simulating a process with more than one hght 
quark. The safe way is to define many stochastic propagators corresponding to 
the different contractions. To save memory and time it is possible to use just one 
light_propagator to compute all of the "physical" propagators (assuming the 
corresponding quarks are degenerate in mass). In this case the average of eq. (^) 
cannot be computed independently for each of the propagators and unwanted 
correlations must be somehow eliminated. 

3.6 Smearing 

To improve the overlap between a current, j\ and the physical state one wants 
to study, it is common to smear the light propagator. To this scope the 

gauge invariant "Wupperthal" smearing||20|| is implemented 

float epsilon=0 . 25 ; 
int num_iter=60; 

psi=smearing(chi ,epsilon,num_iter) ; 

This smears the x fi^ld with a smearing parameter e, iterates the procedure 
num_iter times, then saves the smeared field in tp. 
Here is how the function smearing is implemented 

fermi_field smearingCf ermi_f ield psi, gauge_field U, 

float epsilon, int level) { 

fermi_field phi; 
int n; 

register site x; 

printf ("Smearing fermionic configuration. . .\n") ; 
for(n=0; n<level; n++) { 
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if(n>0) psi=phi; 
for(x=0; x<Nvol; x++) 
phi (x) =psi (x) +epsilon* 
(U(x,l)*psi(up[x] [1]) + 
U(x,2)*psi(up[x] [2])+ 
U(x,3)*psi(up[x] [3])+ 

herinitian(U(dw[x] [1] , 1) ) *psi (dw [x] [1] ) + 
hermitian(U(dw[x] [2] , 2) ) *psi (dw [x] [2] ) + 
hermitiaii(U(dw[x] [3] ,3))*psi(dw[x] [3] ) ) ; 

}; 

prepare (phi) ; 
return phi ; 

}; 

To smear the sink (and/or the source) of a stochastic propagator it is necessary 
to apply the smearing function to its member fields 

for(n=0; n<Nferini; n++) { 

S.psi [n]=smearing(S.psi [n] ,epsilon,niim_iter) ; // smear sink 
S.chi [n]=smearing(S.chi [n] ,epsilon,niiin_iter) ; // smear source 

} 

3.7 Examples: fuzzing and cooling 

Fuzzing is a kind of gauge inveriant smearing that is not exphcitely implemented 
in MDP_QCD but there are different ways of doing it. It essentially consists of 
performing an ordinar smearing using a a "fuzzed" gauge configuration instead 
of the normal one. Here is how it could be implemented 

// EXAMPLE CODE: function not implemented! 

fermi_field fuzzing(f ermi_f ield psi, gauge_field U, int num_iter) { 
fermi_field phi; 
gauge_field Ufuzzed; 
site x; 

int mu , i ; 

for(i=0; i<num_iter; i++) 
for(x=0; x<Mvol; x++) 

for(mu=0; mu<Ndim; mu++) {_ 

Ufuzzed (x , mu) =pro j ect_on_SUn (U (x , mu) +0 . 25*staple (U , x ,mu) ) ; 

}; 

phi=smearing (psi , Ufuzzed, . 25,2) ; 
prepare (phi ) ; 
return phi; 

}; 

The function project_on_SUn is not implemented in MDP_QCD and there are 
different ways to do it. Its job is to return some kind of projection of the matrix, 
which is passed as argument, on the SU{Nc) group. This procedure is called 
"cooling" . One way of implementing it is 
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// EXAMPLE CODE: function not implemented! 
Matrix project_on_SUn (Matrix A) { 

Matrix logA, B(Nc,Nc); 

int i ; 

B=0; 

logA=log(A) ; 

for(i=l; i<=Nadj ; i++) 

B+=Generator [i] *real (0 . 5*trace (Generator [i] *logA) ) ; 
B=exp(B) ; 
prepare (B) ; 
return B ; 

>; 

Note that Generator [i] form a basis of SU{Nc) and are built in in MDP_QCD. 
This code is very slow, but it can be speeded up by decreasing the precision in the 
computation of logO and expO for matrices. This is set in the flag PRECISION 
declared in the file MDP_Libl.h. In any case this is usually not the critical part 
of a lattice simulation. 

3.8 Basic algorithms: multihitO 

The multihit code consists of the following iteractive procedure [0] |^ 
► For each link of the lattice 

(1) Generate a random matrix 

(2) Compute the variation in the action, 6S 

(3) If 6S > X, where x is random real number, substitute the new link to the 
old one 

O Repeat steps {1,2,3} n_hits times 
O Repeat from ► n_iter times 
This algorithm is implemented in the following code 

void multihit (gauge_field U, int n_hit=10, int n_iter=l) { 
register int i .hit ,mu,nu; 

register site x; 

register double delta_action, efficency; 
long updated; 
Matrix Unew.UUU; 
for(i=0; i<n_iter; i++) ■[ 

printf ("Multihit step n.7,i, beta=y.f . . . \n" , i ,beta) ; 

updated=0 ; 

for(x=0; x<Nvol; x++) { 
for(mu=0; mu<4; mu++) { 
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UUU=staple(U,x,mu) ; 
for(hit=0; hit<n_hit; hit++) { 
Unew=Random.SU(Nc)*U(x,inu) ; 
delta_action=exp(beta/Nc* 

real (trace (UUU*hermitian(Unew-U(x,mu) ) ) ) ) ; 
if (delta_action>Random.Float ) { 
U(x,inu)=Unew; 
updated++ ; 
hit=n_hit ; 

>; 

>; 

}; 

>; 

ef f iceiicy=100 .0*updated/(Nvol*Ndim) ; 
printfC'... Efficency: 7, . 2f y,y,\n" , eff icency) ; 

}; 

}; 

The parameter called efficiency, computed by this function and printed 
on the standard output, is the ratio between the number of links that has been 
updated on each multi-hit step and the total number of links. 



3.9 Basic algorithms: mul_Q() and iiiul_invQ() 

The function mul_Q() performs multiplication by the fermionic matrix as it ap- 
pears in the SW action. Here is a possible implementation: 

feniii_field niul_Q(gauge_f ield U, ein_field G, 

f eriiii_f ield psi, int sign=l) { 
// Slow Version! 
site x; 
int mu,nu; 
feniii_field tmp; 
tmp=psi ; 

for(x=0; x<Nvol; x++) 

for(mu=0; mu<Ndim; mu++) { 

tmp(x)-=kappa*mul_lef t (l-sign*Gamina[inu] , 
U(x,mu)*psi(up[x] [mu])) 
+kappa*mul_left(l+sign*Gamina[inu] , 

hermitian(U(dw [x] [mu] ,mu) ) *psi (dw [x] [mu] ) ) ; 
for(nu=mu+l; nu<Ndim; nu++) 
tmp (x)-=kappa*c_SW*left_mul (Sigma [mu] [nu] , 
G(x,mu,nu)*psi(x)) ; 

>; 

prepare (tmp) ; 
return tmp; 

}; 

This would work hut, in practice, the real implementation has been optimized 
to reduce the number of loops. 
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In the real implementation the multiplication when csw 7^ is only 80% 
slower than when csw = (because the "clover" term is not computed). 
The command 



psi=inul_invQ (U, G, chi, delta); 



returns ip = Q^^[U]x- The inversion is performed using the technique of minimal 
residue [If. Note that the electromagnetic tensor G is passed as an argument 
as well. This is redundant but it is much faster if many inversions have to be 
performed on the same gauge configuration and csw is different from zero. 
The minimal residue scheme consists of the following iterative procedure 

► 00 = X, ^"0 = X - Q[u]x 
{L) ai- (^Q[u]r,y{Q[u]n) 
(2) (pi+i = 0i + aiTi, Ti+i = ri- aiQ[U]ri 

O iterate step {1, 2} until \rif is less then the required precision (delta/ (Nvol*Nc*Nc)^ 

■ (pi converges towards ip. 



Here is how it is implemented 

fermi_field mul_invQ (gauge_f ield U, em_field G, fermi_field chi, 

float delta_res=0.0001, int sign=l) { 
printf ("Inverting the Q matrix with kappa=7tf , c_SW=yof ...\n", 

kappa, c_SW) ; 
f ermi_f ield psi, res, Qres; 
float residue; 
Complex alpha; 
int step=0; 
psi=chi ; 

res=chi-mul_Q(U,G, chi, sign) ; 
do { 

Qres=mul_Q (U , G , res , sign) ; 
alpha=(Qres*res)/ (Qres*Qres) ; 
psi=psi+alpha*res ; 
res=res-alpha*Qres ; 
residue=real(res*res)/ (Nvol*Nc*Nc) ; 
step++ ; 

printf (" Step 7„i =>7,f \n" , step, residue); 
} while (residue>delta_res) ; 

printf ("... 7oi steps, precision reached=7oe\n" , step, residue); 
prepare (psi) ; 
return psi; 

}; 
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3.10 Input/Output 

One gauge configuration U can be saved with the command 
write(U, "filename", g) ; 

where g is an integer. Assuming for example that g is 36, the configuration is 
saved in a file called filename . 00036. The command to load this file in a gauge 
configuration U is 

read(U, "filename", g) ; 

Using properties of SU {Nc) the configuration is compressed when it is saved and 
automatically decompressed when it is read. After the compression each link 
occupies 12{N^ — N^) bytes instead of IQN^- 

Fermionic configurations are saved and loaded with the commands 

write (psi, "filename", g, n) ; 
readCpsi, "filename", g, n) ; 

to/from a file filename .g.n. 

A light_progator S is saved with the command 

write(S, "filename", g) ; 

which calls 

for(n=0; n<Nfermi; n++) write (S .psi [n] , "filename", g, n) ; 

It can be read with 

G=compute_em_tensor (compute_plaquettes (U) ) ; 
read(S, U, G, "f ilename" ,g) ; 

Note that it is necessary to pass the gauge field and the electromagnetic field to 
this function. They are necessary to rebuild the S . chi [Nf ermi] fields from the 
S .psi [Nf ermi] . U must be exactly the same gauge configuration on which the 
S. phi [Nf ermi] fields had been generated. 
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object class 


memory (bytes) 


Matrix 
gauge_f ield 
pl_field 
ein_f ield 
f ermi_f ield 
light -propagator 


11 + rows X columns 

5 + 32 X N^olume X 
5 + 96 X N^olume X 
5 + 96 X N^olume X A^.^ 

5 + 32 X Nyolume X 

(5 + 32 X Nyolume X Nc) X2X Nfermi 



Table 3: Memory usage for the different fields. Nyoiume is for the total number of 
lattice sites. 



lattice 


Uil) 


SU{2) 


SU{3) 




SUi5) 


2^ X 3 


0.03 


0.03 


0.10 


0.33 


0.47 


4=^ X 6 


0.28 


0.50 


1.21 


2.72 


5.99 


6^ X 9 


1.25 


2.65 


6.18 


14.54 


32.94 


8^ X 12 


3.84 


8.02 


20.07 


45.18 


93.65 


12^ X 18 


20.14 


38.79 


98.07 







Table 4: Time in seconds to perfrom one multihit step (1 hit per hnk) on a SUN 
UltraSPARC 5. 



lattice 


U{1) 


SU{2) 


SU{3) 




SU{5) 


2^ X 3 


0.05 


0.07 


0.13 


0.24 


0.43 


43 X 6 


0.46 


1.08 


2.20 


4.19 


6.36 


6^ X 9 


2.24 


5.34 


11.20 


19.37 


31.95 


8^ X 12 


7.14 


14.87 


35.80 


62.24 


105.00 


12^ X 18 


35.74 


84.32 


174.90 







Table 5: Time in seconds to generate one random fermionic configuration (it 
involves one call to mul_invQ() with a precision of 10~^), far from the chiral 
hmit. Computed on a SUN UltraSPARC 5. 
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3.11 Memory and speed 

Tab. d^) shows the required memory to store each of the fundamental objects (as 
function of the lattice size defined in MDP_settings .h). 

The function mul_invQ () allocates 3 f ermi_f ield and one of them is returned. 

All the other functions have been optimized and essentially they do not use 
more memory than that allocated to their arguments plus the memory required 
to store the argument that is returned. It is important to remeber that when a 
field is returned it is NOT copied, but only the pointer to the physical memory 
is copied. In other words the memory occupied by a field object that has to be 
returned by a function is the same memory where the field will be stored after 
the function has returned. 

Since the time required to copy memory has been reduced to its minimum 
any function declared in MDP_QCD.h is very fast. MDP_QCD enables to run signi- 
ficative lattice simulations on a SUN UltraSPARC or on a Pentium II PC. Some 
benchmarks executed are reported in tab. @ and tab. (|^) for the most common 
operations. 

It is important to stress that MDP_QCD does not include any optimization for 
parallel machines and many of the trick that are implemented may result in a 
slowing down on parallel computers without shared memory. 
Internally all the fields and matrices are seen as one dimensional arrays of Complex. 
This gives some benefit on vectorial machines providing the compiler is able to 
optimize it. 
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Lattice notation and conventions 



Equivalence between Lattice Euclidean and Minkowsky Space quantities 
(where a is the lattice spacing). 



Euclidean Lattice 


Minkowsky Space 








—1 ?■ 
a X 






Qi 


adi 




—iaAo 


A' 


aAi 












l\ 











Metric 



Pauli matrices 



-5^" = diaK(-l, -1,-1,-1 



sigma[l] = (71 = 
Sigma [2] = a2 = 
Sigma [3] = 0-3 = 
they undergo the commutation relation 



1 

1 

-i 

1 

1 
-1 



(30) 



[(Ti, aj] = 2ie'^''ak 

Dirac matrices in Dirac representation (the same convention of 
UKQCD) 



Gamma [0] 
Gamma [i] 
Gamma5 



7 



7 



7 



1 

-1 

-zcTj 
iai 

1 

1 



(31) 

(32) 
(33) 
(34) 

(35) 
and 

(36) 
(37) 
(38) 
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Gammal 
Sigma [mu] [nu] 
Pleft 
Pright 



111/ 



Pi 



left 



Pright 



_ 1 - 7^ 
~ 2 
1 + 7^ 



(39) 
(40) 
(41) 
(42) 



Note that all the 7 and a matrices are hermitian and, by definition, 7^ 



Traces 



where e^^^s = ^0123 ^ 
Gell-Mann matrices. Lambda [i] 

= 







A^ = 

Discrete symmetries [p^j: 
P ■■ S%{x,y)[U] 






1 




1 





:) 










1 













1 

















—i 











i 























—i 





i 






A^ 




C 
T 
H 



(43) 
(44) 
(45) 
(46) 



SU^,y)[U] = ilV)aa'S%ix'',y^)[U^]iYYh'p 

U^, W-^ , U'^ are the parity reversed, charge conjugate, time reversed 
configuration respectively. 



(47) 
(48) 
(49) 
(50) 



(51) 
(52) 
(53) 
(54) 

j;auge 
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Basic syntax for MDP_QCD 



Classes defined in the file MDP_Libl.h included by MDP_QCD.h: 



class Toy_class; 

class Matrix; 

class Random_generator ; 

class JackBoot; 



Classes defined in the file MDP_QCD . h : 



class gauge_field public: Toy_class<Complex, Nvol*Ndim*Nc*Nc> ; 

class pl_field public: Toy_class<Complex, Nvol*6*Nc*Nc> ; 

class em_field public: Toy_class<Complex , Nvol*6*Nc*Nc> ; 

class fermi_field public: Toy_class<Coinplex, Nvol*Nc*Nspin> ; 

class light_propagator; 



The elements of each field-type class can be accessed as matrices 

Matrix gauge_f ield: : operatorO (site x, int mu) ; 

Matrix em_f ield: : operator () (site x, int mu, int nu) ; 

Matrix pl_f ield: : operator () (site x, int mu, int nu) ; 

Matrix fermi_f ield: : operator () (site x) ; 



or as complex numbers 



Complex gauge_f ield: : operatorO 
Complex em_f ield: : operatorO 

Complex pl_f ield: : operatorO 

Complex fermi_f ield: : operatorO 



(site X, int mu, int i, int j); 
(site X, int mu, int nu, 

int i , int j ) ; 
(site X, int mu, int nu, 

int i , int j ) ; 
(site X, int i, int alpha); 



To save memory em_f ield and pl_f ield only store those with components 
mu<nu. The class light propagator has peculiar class members and they 
are discussed in the proper section. 

Global functions: 



site move(site x, int mu, int Imu) ; 

site move(site x, int mu, int Imu, int nu, int lu) ; 

site coordinate_space(x) ; 

site position(int xO, int xl, int x2, int x3) ; 
site position(int xO, site x_space) ; 
float dist(site x, site y) ; 
void start ; 
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void stopO ; 

Matrix plaquette(gauge_f ield U, site x, int mu, int nu) ; 
Matrix staple_up(gauge_f ield U, site x, int mu, int nu) ; 
Matrix staple_dw(gauge_field U, site x, int mu, int nu) ; 
Matrix staple (gauge_f ield U, site x, int mu, int nu) ; 
float average_plaquette (gauge_f ield U) ; 
gauge_f ield coldO ; 
gauge_field hotO; 
float action(gauge_f ield U) ; 

void multihit(gauge_field U, int n_steps, n_hits) ; 
pl_field compute_plaquettes (gauge_f ield U) ; 
em_f ield compute_em_tensor (pl_f ield Up) ; 
fermi_field mul_Q(gauge_f ield U, em_field T, 

fermi_field psi, int dagger=l) ; 
fermi_field mul_invQ (gauge_f ield U, em_field T, 

fermi_field psi, 

float delta=le-4, int dagger=l) ; 
fermi_field new_f ermi_conf ig(gauge_f ield U, em_field T) ; 
fermi_field smearing(f ermi_f ield psi, gauge_field U, 

float epsilon, int level) ; 

Global functions for Input /Output: 

void write (gauge_f ield U, char filename[], int gauge); 
void read (gauge_f ield U, char filename[], int gauge); 
void write (fermi_f ield psi , char f ilencone [] , int gauge, int nf ermi) ; 
void read(fermi_f ield psi, char filename [], int gauge, int nf ermi) ; 
void write (light_ptopagator S, char filenajne[], int gauge); 
void read (light _propagator S, gauge_field U, em_field T, 
char filename[], int gauge); 

Global constants (defined in MDP_Settings .h, included by MDP_QCD.h). 
They can be modified by the programmer. 





C++ with MDP_QCD 


Value 


lattice sites in time 


NxO 


8 


lattice sites is xi 


Nxl 


4 


lattice sites is X2 


Nx2 


4 


lattice sites is 


Nx3 


4 


n° colors 


Nc 


3 


n° fermi configs 


Nf ermi 


10 



(55) 



Global constants that should not be modified by the programmer and can 
be used in main and in functions. 
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C++ with MDP_QCD 


Vahic 


lattice sites in space 


Nspace 


Nxl*Nx2*Nx3 


lattice sites in total 


Nvol 


NxO*Nspace 


n° dimensions 


Ndim 


4 


n° spin components 


Nspin 


4 


elements in adjoint 


Nadj 


Nc*Nc-l 




DAGGER 


-1 


new type 


site 


long 


logical operator 


and 


&& 


logical operator 


or 


1 1 



(56) 



• Global variables: 





C++ with MDP_QCD 








beta 


float 








kappa 


float 




csw 




c_SW 


float 








up [x] [mu] 


site 




X — Jl 




dw[x] [mu] 


site 








CO [x] [mu] 


int 




7;t=0..3 




Gamma [mu] 


Matrix 


4x4 


7^ 




Gamma5 


Matrix 


4x4 






Sigma [mu] [nu] 


Matrix 


4x4 




1) 


sigma[i] 


Matrix 


2x2 


^c=0..iV| (^0 


1) 


Lambda [a] 


Matrix 


3x3 


= 1) 


Generator [c] 


Matrix 





The variables f3, k and csw must be initialized in the main program before 
they are used. Note that the matrices uj'^='^"'^'' form a general basis of 
Hermitian matrices for SU (iVc) . They coincide with the Pauli matrices for 
Nc = 2, but not with the Gell-Mann matrices for Nc — 3. 
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Figure 2: When the object x is returned the copy constructor (CC) is called and 
it creates a temporary object. The objects are represented as circles. 

C Toy_class and how it works 

Some books |I| about C++ warn the reader about the problems of returning ob- 
jects. In fact when an object is returned by a function, it is copied by the copy 
constructor into a temporary object. When the function terminates the tempo- 
rary object is returned (fig. |^). 

If the original object to be returned contains a dynamically allocated pointer 
(p) then its value is copied into the temporary object but the memory pointed by p 
is not. The problem arises when the function terminates because the destructor 
of the original object is called and the memory pointed to by p is deallocated 
(fig. 0). Hence the temporary object contains a pointer to a location of memory 
which is no longer allocated. 

When this situation occurs, there is sometimes a runtime error such as "bus 
error" but often the program continues to run accessing to locations of memory 
containing random data without protection. 

Consider for example the following code: 

// THIS CODE IS WRONG! ! ! 

template <class T, long imax> class Toy_class { 
public : 
T *in; 

Toy_class() { 

m=new T [imax] ; 

for(i=0; Kimax; i++) m[i]=0; 

>; 

>; 

Toy_class f { // function f () 

Toy_class<int , 10> x; 
x.p[3]=5; 
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Figure 3: The temporary object created by the copy constructor (CC) is assigned 
to the argument of the calhng function, then the original object x is destroyed 
and the memory is deallocated. 



return x; 

>; 

T g(Toy_class<iiit , 10> y) { // function g() 

return y.p[3] ; 

>; 

int mainO { // main program 

cout « g(f ()) ; 
return ; 



It is supposed to print '5' on the screen, but it may give any random result: 
it is not safe! 

There are two standard ways to overcome this problem: 

• Redefine the copy constructor so that it will allocate new memory and will 
copy p[i] for every i. 

Toy_class (Toy_class &x) { 
if(p!=0) delete [] p; 
p=new T [imax] ; 

for(int i=0; i<imax; i++) p[i]=x.p[i]; 

>; 

This is pretty safe, but it may force you to copy a huge amount of memory 
when it is not necessary (fig. |^). 

• Write all the functions so that their arguments are passed and returned by 
reference (so that the copy constructor is never called). Write the destructor 
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Figure 4: A possible solution: The memory is copied by the copy constructor 
before the destructor of x is called. 

in such a way that, when an object is destroyed, the memory pointed to 
by p is not deallocated. It is also necessary to keep trace of the allocated 
memory and deallocate it somewhere in the program. This works, but only 
in very simple cases. This technique will become intractable in the presence 
of recursive functions which pass and return objects. 

In this appendix a solution to this problem is proposed, based on the idea 
of attaching a FLAG to each object. The FLAG will contain information about 
the status of the object and it will be used to implement, in the most general 
framework, a simple and efficient way of passing and returning objects of any 
kind, even in recursive structures. Moreover all the tricks due to the dynamical 
allocations will be hidden within the basic methods of the class (constructor, copy 
constructor, destructor and assignment operator). 

The word "efficient" here means that the program will automatically take 
care of the memory used and nothing will be copied if it is not necessary, hence 
the code will be optimized both in speed and memory usage. 

Suppose that one attaches a FLAG, i.e. a new member variable, to each object 

enum value {F, T, C, H} FLAG; 

and one uses it in the following way: 

The FLAG is always set to (F)REE by the constructor. If the object is not 
going to be returned its FLAG will remain unchanged and the destructor will 
deallocate the memory pointed by its pointer when it is called. If the object is 
going to be returned its FLAG is changed to (R)ETURN and its destructor will 

The technique that wiU be explamed is used to implement the class Matrix, gauge_f ield, 
pl_f ield, em_f ield and f ermiJield. 
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X 



/( )- 



'. ^^^^ 



CC 



Figure 5: The destructor of x knows from the flag R that this object is going 
to be returned and it does not deallocate the memory. The temporary object y 
carries an F flag, hence it will be deallocated later. 

not deallocate the memory in this case. On the other side the copy constructor 
that is going to create the temporary object will check the FLAG of its argument 
and if it is R, it will set the FLAG of the temporary object to F (fig. 

Moreover it is necessary to take care of the possibility of passing an F object 
as argument of a function: in this case the copy constructor will generate a 
temporary object with a C FLAG and the destructor will be implemented in such 
a way that the memory pointed by a C will never be deallocated. 

In other words a F object is a normal object that sooner or later is going to 
be destroyed (in the same function, or method, where it has been created), while 
a C one is a copy. Its pointer is pointing to memory allocated by someone else 
(a F object existing at an higher level). On the other side an object R exists 
only in the brief instant between the moment when the copy constructor is called 
and when its own destructor is called. The memory that it is pointing to is not 
deallocated because the new F temporary object created by the copy constructor 
will contain a reference to it. Such a memory will be deallocated later when the 
destructor of the temporary object will be called somewhere automatically. The 
three possible situations are illustrated in fig. ^ (left). 

To make the FLAG usage even safer it is better to define it as a private member 
and define a function (prepare ()) that sets the FLAG of the object to R before 
it is returned. 

A FLAG value (H)yperlink allows an object to contain a pointer to an address 
of memory allocated by an object of a different class. So that this memory will 
never be deallocated by the destuctor of the object. 

According with these prescriptions the program class Toy_class should be 
defined in the following way: 
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Figure 6: Different kinds of objects that may be passed to (and returned by) a 
function are represented. In these diagrams the copy constructor is automatically 
called when the arrow crosses the boundary of a box (representing a functional 
level in the program). 



template <class T, long imax> 
class Toy_class { 
public : 

enum value {F.R.C.H} FLAG; 

T *in; 

Toy_class() { 

register long i; 
FLAG=F; 

in=new T [imax] ; 

for(i=0; i<imax; i++) m[i]=0; 

>; 

Toy_class(T *mO) { 
FLAG=H; 
in=mO; 

>; 

Toy_class (const Toy_class &a) { 
in=a.in; 

switch(a.FLAG) { 
case F: FLAG=C; break; 
case R: FLAG=F; break; 
case C: FLAG=C; break; 
case H: FLAG=H; break; 

>; 
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~Toy_class() { 

if((FLAG==F) && (m!=0)) delete [] m; 

>; 

operator= (Toy_class a) { 
register long i; 
switch (FLAG) { 
case H: 

for(i=0; Kimax; i++) m[i]=a.m[i] ; 
break; 
case F: 

if (a.FLAG==F) { 

if ((in!=0) && (m!=a.m)) delete [] m; 

m=a.m; 

a. 01=0; 
} else { 

for(i=0; Kimax; i++) m[i] =a.m[i] ; 

>; 

break; 
case C: 

if (a.FLAG==F) { 

iii=a.ni; 

a . m=0 ; ; 
> else { 

in=new T [imax] ; 

for(i=0; Kimax; i++) m[i]=a.m[i]; 

>; 

FLAG=F; 
break; 
case R: 

errorC'What the Hell are you doing?"); 
break; 

}; 

>; 

inline friend void prepeire (Toy_class &a) { 
register long i; 
T* m; 

switch (a. FLAG) { 

case F: a.FLAG=R; breeik; 

case R: 

printfC'You shoud not call prepareO; twice\n"); 
break; 
default : 

m=new T [imax] ; 

for(i=0; Kimax; i++) m[i]=a.m[i]; 

a . m=m ; 

a.FLAG=R; 

break; 

>; 

>; 

>; 
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Now the following program works! 

Toy_class f() { // function f() 

Toy_class<int , 10> x; 
x.p[3]=5; 
return x; 

>; 

T g(Toy_class<int , 10> y) { // function g() 
return y .p[3] ; 

>; 

int mainO { // main program 

cout « g(f ()) ; 
return ; 

>; 

Note how the information contained in the FLAG has been used by the func- 
tions acting on the objects (in particular by the assignment operator). To opti- 
mize both speed and memory usage it is required for a C object (i.e. the copy 
made by the copy constructor of an F object existing at a higher level, see fig. ^ 
(right)) to be copied element by element into a new location of the memory, oth- 
erwise only its pointer is copied (the fastest way). Moreover when a new value is 
assigned to a C type object it is promoted to type F. 
As an exercise the reader is suggested to work out the deatils. 
Once the class with the FLAG has been implemented then all these technicalities 
can be forgotten but the safety rules, stated in the next appedinx, should always 
be kept in mind! 

D Safety rules 

Since the copy constructor has been redefined and it is now FLAG dependent there 
are a few safety rules to follow to be sure that everything is working properly. 
These safety rules applys to objects of class: 

Toy_class 
Matrix 
gauge_f ield 
pl.field 
eni_f ield 
f ermi_f ield 

These safety rules are: 

Always, before returning an object, change its FLAG to R, i.e. call 
prepare (o^jecO ; 

Never explicitly use the copy constructor. 
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if Do not call the Eissignment operator of the Eirgument of a function 
within the function itself (unless you understood completely how 
Toy_class works). 



Attention: Different machines may have a different internal representations 
of real numbers. Therefore some care must be taken when copying data between 
different machines. This also holds for the file containing the seed: 
RandomBuf f er . seed 
Therefore: 

if Do not copy the seed from one machine to another, let the start () ; 
instruction create a new seed file. 
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